Question 3.1

data = read_excel("Chapter3_exercises_data.xlsx", sheet = "Exercise 1")
## New names:
## • `` -> `...4`
log_RC <- log(data$rpce)
log_I <- log(data$rdpi)
Growth_RC <- diff(log_RC)
Growth_I <- diff(log_I)
plot(data$date[-1],Growth_RC, type = "l", col = "blue", xlab = "Year", ylab = "Growth Rate (Real Consumption)", main = "Growth Rate of Real Consumption")

plot(data$date[-1],Growth_I, type = "l", col = "red", xlab = "Year", ylab = "Growth Rate (Income)", main = "Growth Rate of Disposable Income")

Growth rate of real consumption is less volatile than growth rate of disposable income, probably because the permanent income model relates to this phenomenon by postulating that current and expected future income levels (together lifetime income) drives consumption (expenditure) patterns, but is smoothed over time. So if someone has an increase in income, they will smooth that gain over their lifetime and not spend it proportionally immediately. Thus in this example, one would change their consumption in magnitude less in response to an the income change. This data is evidence of the permanent income hypothesis.

model <- lm(Growth_RC ~ Growth_I)

# Print the summary of the regression model
summary(model)
## 
## Call:
## lm(formula = Growth_RC ~ Growth_I)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0304050 -0.0029792  0.0001606  0.0030383  0.0244504 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.0022543  0.0002242  10.056  < 2e-16 ***
## Growth_I    0.1745175  0.0292014   5.976  3.8e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.005317 on 637 degrees of freedom
## Multiple R-squared:  0.05309,    Adjusted R-squared:  0.05161 
## F-statistic: 35.72 on 1 and 637 DF,  p-value: 3.799e-09

The estimated linear equation is that Growth of consumption = 0.0022543 + 0.1745175 Growth of Income. This suggests that a disposable income growth expects a positive change in consumption. The t values and p values suggest statistical significance, so at the 95% level income growth appears to positively drive expenditure. This R2 score is also very low, meaning that our independent variable of income growth only accounts for about 5% of total variation. Our coefficient of rdpi_growth means that a 1% growth in income is expected to give a 0.17% growth in consumption. Because 0.17% < 1%, this aligns with the permanent income hypothesis.

Growth_I_lag <- lag(Growth_I)
lr_growth_lag <- lm(Growth_RC~Growth_I+Growth_I_lag)
summary(lr_growth_lag)
## 
## Call:
## lm(formula = Growth_RC ~ Growth_I + Growth_I_lag)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0300818 -0.0028874 -0.0000051  0.0029768  0.0255088 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0019889  0.0002405   8.269 7.90e-16 ***
## Growth_I     0.1872175  0.0293870   6.371 3.61e-10 ***
## Growth_I_lag 0.0828418  0.0293865   2.819  0.00497 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.005284 on 635 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.06476,    Adjusted R-squared:  0.06182 
## F-statistic: 21.99 on 2 and 635 DF,  p-value: 5.855e-10

With the lagged consumption growth we do see a small increase in the Growth_I coefficient, as well as a positive coefficient for the lag parameter. The actual coefficient of Growth_I_lag means that a 1% increase in income in the previous period is expected to give a 0.08% increase in consumption in the current period. The t values and p values of the intercept and the Growth_I variable remain to suggest significance, but the lagged parameter is just passing by at the 95% significance level. This finding does not present strong evidence that last periods growth in income has a significant effect on consumption pattern, which coincides with the permanent income hypothesis. Also, the Adjusted R2 rose to 0.062, but this increase is not notably large.

Question 3.3

real_gdp_data <- read_excel("Chapter3_exercises_data.xlsx", sheet = "Exercise 3a")
## New names:
## • `` -> `...3`
read_gdp_mean <- mean(real_gdp_data$rgdp)

ggplot(data=real_gdp_data, mapping=aes(date, rgdp)) +
  geom_line(color='blue', lwd=1) +
  geom_hline(yintercept=read_gdp_mean, linetype='dashed') +
  ggtitle('US Real GDP') +
  xlab('Year') +
  ylab('RGDP')

Definition: Value of goods and services produced in the US adjusted for inflation.
Periodicity: Quarters, 1947-2012.
Units: USD billions chain weighted.
Stationary: There is a clear upward trend with some small local dips and peaks, so this time series is not first (second) order weakly stationary.

exchange_rate <- read_excel("Chapter3_exercises_data.xlsx", sheet="Exercise 3b")
## New names:
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
exchange_rate_mean <- mean(exchange_rate$jpy_usd)

ggplot(data=exchange_rate, mapping=aes(DATE, jpy_usd)) +
  geom_line(color='blue', lwd=1) +
  geom_hline(yintercept=exchange_rate_mean, linetype='dashed') +
  ggtitle('Exchange Rate of Yen vs USD') +
  xlab('Year') +
  ylab('Rate')

Definition: The value of yen (foreign currency) that is equal to 1 USD.
Periodicity: Monthly, 1971-01-04 to 2012-06-01.
Units: Rate of Yen to 1 USD.
Stationary: There is a clear downward trend with some small and moderate local dips and peaks, so this time series is also not first (second) order weakly stationary.

maturity_yield <- read_excel("Chapter3_exercises_data.xlsx", sheet="Exercise 3c")
## New names:
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
maturity_yield_mean <- mean(maturity_yield$CMRate10Yr, na.rm=TRUE)

# Removing zero values under assumption that these should be NA.
maturity_yield[maturity_yield==0] <- NA

ggplot(data=maturity_yield, mapping=aes(DATE, CMRate10Yr)) +
  geom_line(color='blue', lwd=1) +
  geom_hline(yintercept=maturity_yield_mean, linetype='dashed') +
  ggtitle('10-year Treasury Constant Maturity Yield') +
  xlab('Rate') +
  ylab('Year')
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

Definition: Yields on actively traded non-inflation-indexed issues adjusted to constant maturities.
Periodicity: Daily, 1962-01-02 to 2012-06-07.
Units: Rate.
Stationary: This plot is less clear in respect to any trend. Before the mid 1980’s there is an upward trend, but after there is a downward trend. There is does not appear to be a meaningful mean of this series nor is there a seemingly constant degree of variance in the cycles. This series is doubtful to be first (second) order weakly stationary.

unemployment <- read_excel("Chapter3_exercises_data.xlsx", sheet="Exercise 3d")
## New names:
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
unemployment_mean <- mean(unemployment$unemrate)

ggplot(data=unemployment, mapping=aes(DATE, unemrate)) +
  geom_line(color='blue', lwd=1) +
  geom_hline(yintercept=unemployment_mean, linetype='dashed') +
  ggtitle('US Unemployment Rate') +
  xlab('Year') +
  ylab('Rate')

Definition: The percent of unemployed people over the labor force. The US Labor force includes those 16 years of age and up, not in institutions, not on active military duty, residing in the United States.
Periodicity: Monthly, 1948-01-01 to 2012-05-01.
Units: Rate.
Stationary: This plot has an overall upward trend, but does in fact fluctuate about the mean more than the previous series. It is unclear if this is first order weakly stationary. Since the variances are more obviously not constant, I would be confident enough to at least claim that it is not second order weakly stationary.

Question 3.1

data = read_excel("3.5.xlsx")
data
plot(data$Date,data$GDP, type = "l", col = "blue", xlab = "Date", ylab = "GDP", main = "GDP trend")

The underlying stochastic process is not weakly stationary. The upward trend indicates that the process must have different means in different periods of time, so that it is not first order stationary.

# Calculate the growth rate of nominal GDP
data$g1t <- 100 * ((data$GDP-lag(data$GDP)) / lag(data$GDP))
# Print the updated table with the new GDP growth column
print(data)
## # A tibble: 16 × 4
##    Date                   GDP `ln(GDP)`     g1t
##    <dttm>               <dbl>     <dbl>   <dbl>
##  1 2001-01-01 00:00:00 10022.      9.21 NA     
##  2 2001-04-01 00:00:00 10129.      9.22  1.07  
##  3 2001-07-01 00:00:00 10135.      9.22  0.0612
##  4 2001-10-01 00:00:00 10226.      9.23  0.900 
##  5 2002-01-01 00:00:00 10338.      9.24  1.09  
##  6 2002-04-01 00:00:00 10446.      9.25  1.04  
##  7 2002-07-01 00:00:00 10546.      9.26  0.965 
##  8 2002-10-01 00:00:00 10618.      9.27  0.673 
##  9 2003-01-01 00:00:00 10745.      9.28  1.20  
## 10 2003-04-01 00:00:00 10884       9.30  1.30  
## 11 2003-07-01 00:00:00 11117.      9.32  2.14  
## 12 2003-10-01 00:00:00 11271.      9.33  1.39  
## 13 2004-01-01 00:00:00 11473.      9.35  1.79  
## 14 2004-04-01 00:00:00 11658.      9.36  1.61  
## 15 2004-07-01 00:00:00 11815.      9.38  1.35  
## 16 2004-10-01 00:00:00 11995.      9.39  1.52
plot(data$Date, data$`ln(GDP)`, type = "l", col = "blue", xlab = "Date", ylab = "ln(GDP)", main = "ln GDP trend")

The logarithmic transformation helps to stabilize the variance. The figures above show that the log transformation does not affect the trending behavior of the GDP series, and therefore, yt is not first order stationary but it is smoother than the original GDP series.

# Calculate the growth rate of nominal GDP
data$g2t <- 100 * ((data$`ln(GDP)`-lag(data$`ln(GDP)`)))
# Print the updated table with the new GDP growth column
print(data)
## # A tibble: 16 × 5
##    Date                   GDP `ln(GDP)`     g1t     g2t
##    <dttm>               <dbl>     <dbl>   <dbl>   <dbl>
##  1 2001-01-01 00:00:00 10022.      9.21 NA      NA     
##  2 2001-04-01 00:00:00 10129.      9.22  1.07    1.07  
##  3 2001-07-01 00:00:00 10135.      9.22  0.0612  0.0612
##  4 2001-10-01 00:00:00 10226.      9.23  0.900   0.896 
##  5 2002-01-01 00:00:00 10338.      9.24  1.09    1.09  
##  6 2002-04-01 00:00:00 10446.      9.25  1.04    1.03  
##  7 2002-07-01 00:00:00 10546.      9.26  0.965   0.960 
##  8 2002-10-01 00:00:00 10618.      9.27  0.673   0.671 
##  9 2003-01-01 00:00:00 10745.      9.28  1.20    1.19  
## 10 2003-04-01 00:00:00 10884       9.30  1.30    1.29  
## 11 2003-07-01 00:00:00 11117.      9.32  2.14    2.12  
## 12 2003-10-01 00:00:00 11271.      9.33  1.39    1.38  
## 13 2004-01-01 00:00:00 11473.      9.35  1.79    1.77  
## 14 2004-04-01 00:00:00 11658.      9.36  1.61    1.60  
## 15 2004-07-01 00:00:00 11815.      9.38  1.35    1.34  
## 16 2004-10-01 00:00:00 11995.      9.39  1.52    1.51

From the third and the fifth columns of the table, we observe that there are not significant differences between g1t and g2t, so that the log-difference used in d. is a good approximation to compute growth rates.

Question 3.7

Data <- read_excel("Chapter3_exercises_data.xlsx", sheet="Exercise 7")
Data$pt <- log(Data$`SP500 Index`)
Data$Daily_return <- (Data$pt - lag(Data$pt))
Data #Daily return shown as the fourth column of data 
# Compute sample moments
mean_return <- mean(Data$Daily_return, na.rm = TRUE)
variance_return <- var(Data$Daily_return, na.rm = TRUE)
skewness_return <- moments::skewness(Data$Daily_return, na.rm = TRUE)
kurtosis_return <- moments::kurtosis(Data$Daily_return, na.rm = TRUE)

# Print the computed sample moments
cat("Mean of Daily Returns:", mean_return, "\n")
## Mean of Daily Returns: 3.213237e-05
cat("Variance of Daily Returns:", variance_return, "\n")
## Variance of Daily Returns: 0.000206988
cat("Skewness of Daily Returns:", skewness_return, "\n")
## Skewness of Daily Returns: -0.341148
cat("Kurtosis of Daily Returns:", kurtosis_return, "\n")
## Kurtosis of Daily Returns: 11.36724
# Plot histogram of Daily returns
hist(Data$Daily_return, breaks = 30, main = "Histogram of Daily Returns", xlab = "Daily Returns")

# Create lagged versions of Daily Return
Data$Rt_minus_1 <- lag(Data$Daily_return)
Data$Rt_minus_2 <- lag(Data$Rt_minus_1)
Data$Rt_minus_3 <- lag(Data$Rt_minus_2)
Data$Rt_minus_4 <- lag(Data$Rt_minus_3)

# Plot Rt against Rt−1, Rt−2, Rt−3, and Rt−4
plot(Data$Rt_minus_1, Data$Daily_return, main = "Rt vs. Rt−1", xlab = "Rt−1", ylab = "Rt", col = "blue")

plot(Data$Rt_minus_2, Data$Daily_return, main = "Rt vs. Rt−2", xlab = "Rt−2", ylab = "Rt", col = "green")

plot(Data$Rt_minus_3, Data$Daily_return, main = "Rt vs. Rt−3", xlab = "Rt−3", ylab = "Rt", col = "red")

plot(Data$Rt_minus_4, Data$Daily_return, main = "Rt vs. Rt−4", xlab = "Rt−4", ylab = "Rt", col = "purple")

I cannot discern any pattern in any of the four graphs.

Section 2: Forecasting: Principles and Practice

Question 2.8

private = us_employment %>%
  filter(Title == "Total Private") %>%
  ungroup()

autoplot(private, Employed) + labs(title="Total Private Employed")

gg_season(private, Employed) + labs(title="Total Private Employed")

gg_subseries(private, Employed) + labs(title="Total Private Employed")

gg_lag(private, Employed) + labs(title="Total Private Employed")

autoplot(ACF(private, Employed)) + labs(title="Total Private Employed")

By focusing on the first graph (autoplot), we indeed could find out a strong upward trend and some seasonality, as it fluctuates regularly. This is also proven by looking at the ACF plot. However, by looking at the seasonal plot, the curves are quite flat, which means there might not be strong seasonality appearing. For employment data, this pattern might be good. We can also see that the fluctuation in autoplot is small, which corresponds to this. It seems that there is some cyclicity here, as the line indeed goes up and down for some non-fixed period in the autograph. We may also see that this time series data is in monthly frequency. One unusual period could be 2008-2010, when there is a big decrease in the Employed variable. Something might happen at that period, such as the famous financial crisis.

autoplot(aus_production, Bricks) + labs(title = "Bricks")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_season(aus_production, Bricks) + labs(title = "Bricks")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_subseries(aus_production, Bricks) + labs(title = "Bricks")
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_lag(aus_production, Bricks) + labs(title = "Bricks")
## Warning: Removed 20 rows containing missing values (gg_lag).

autoplot(ACF(aus_production, Bricks)) + labs(title = "Bricks")

By looking at the autoplot graph, we indeed can discover seasonality of the time series within years. We can also find out cyclicity, as the curve goes up and down in unfixed time periods. Linear trends in this case might not be appropriate, but quadratic trends might be useful. We can also find out that this series is in quarterly frequency. By examining the seasonal plot and the subseries plot, we can see that the Bricks variable indeed changes in some regular way throughout quarters in a year. For example, the number of Bricks in the second and third quarter is higher than the rest. This may imply a higher demand of bricks during that period. These are all signs of seasonal patterns. Periods 1973-1975 and 1982-1983 are quire unusual, because the number of Bricks drop at a very high degree in these periods.

autoplot(pelt, Hare) + labs(title = "Hare")

gg_subseries(pelt, Hare) + labs(title = "Hare")

gg_lag(pelt, Hare) + labs(title = "Hare")

autoplot(ACF(pelt, Hare)) + labs(title = "Hare")

Due to some reasons, gg_season() doesn’t work here, but we can also make conclusions on seasonality based on the other four graphs. By checking the autograph and the lag plot, it’s hard to say that any seasonality exists. This mighe be because of the time frequency we choose. Yearly data might not be satisfying. It’s also very hard to determine a trend, as it might only be a flat line. But it seems like some cyclicity exists. It’s quite a common pattern that number of Hare increases a lot in a short period then decreases a lot over time. This might relate to the habit pattern of Hare. It seems like there are no unusual years, as a big increase and decrease in number of Hare is a common pattern.

H02 = PBS %>%
  filter(ATC2 == "H02") %>%
  ungroup()

autoplot(H02, Cost) + labs(title = "H02 Cost")

gg_season(H02, Cost) + labs(title = "H02 Cost")

gg_subseries(H02, Cost) + labs(title = "H02 Cost")

autoplot(ACF(H02, Cost)) + labs(title = "H02 Cost")

Because our dataset has more than 1 index/group, gg_lag() cannot handle this case. However, we can still use the rest 4 graphs to figure out if seasonality presents in our time series data. Since the data has been grouped into 4 groups, we should see the patterns of each of them. We can see that our data is in monthly frequency. By checking the autograph, we can see strong seasonality in three of them except the group General/Co-Payments. Indeed, as we check the seasonal plot, this group has the most chaotic curves, which doesn’t show evidence of seasonality. All of them don’t have a clear sign of cyclicity, and it’s hard to determine the trend for all of them except the group of Concessional/Co-payments, which may have an upward trend. This series tells us that costs among different groups are different, which is a reasonable conclusion. One unusual thing is that the volatility for 2 groups of Concessional are higher than the other two groups. This might be related to the concession type.

autoplot(us_gasoline, Barrels) + labs(title = "Barrels of Oil")

gg_season(us_gasoline, Barrels) + labs(title = "Barrels of Oil")

gg_subseries(us_gasoline, Barrels) + labs(title = "Barrels of Oil")

gg_lag(us_gasoline, Barrels) + labs(title = "Barrels of Oil")

autoplot(ACF(us_gasoline, Barrels)) + labs(title = "Barrels of Oil")

We can see that the time series has weekly frequency. It’s very hard to see the seasonality from just looking at the graphs, as the frequency is high, but we can still recognize some pattern of seasonality. If we look at the ACF curve, it is the evidence of strong seasonality and strong trend. The trend is therefore clear, which an upward linear line would be appropriate. There are some signs of cyclicity as well. The data might imply that the demand for barrels of oil changes over time in a pattern, which might be related to the production. It seems like the volatility of barrels first remains constant and high, but decreases since 2004, and becomes high and constant again in about 2010. This pattern is an unusual fact.

Question 3.2

usa_data <- subset(global_economy, Country == "United States")
usa_data
autoplot(usa_data)+ labs(title = "USAGDP")
## Plot variable not specified, automatically selected `.vars = GDP`

global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP/Population) + labs(title = "United States GDP Per Capita")

Performed a population transformation to obtain per-capita data due to the potential impact of population changes on GDP. The overall trend is the same as the two plots are very similar.

Vic <- subset(aus_livestock, State == "Victoria" & Animal == "Bulls, bullocks and steers")
Vic %>%
  autoplot(Count) + labs(title = "Slaughter of Vicorian Bulls, Bullocks, and Steers")

No transformation

vic_elec %>% autoplot(Demand) + labs(title = "Victorian Electricity Demand")

vic_elec %>%
group_by(Date) %>%
index_by(Date = yearweek(Time)) %>%
summarise(Demand = sum(Demand)) %>%
autoplot(Demand) + labs(title= "Weekly Victorian Electricity Demand", y = "$US (in trillions)")

Performed a calendar transformation to reflect weekly demand rather than half-hourly demand. Plotting a point for every 30 minutes makes the plot difficult to interpret because it is so cluttered that seasonality in particular is hard to observe. Plotting weekly electricity demand results in a much cleaner plot such that it is easier to see the seasonality and variation in weeks.

aus_production %>% autoplot(Gas)

lambda <- aus_production %>% features(Gas, features = guerrero) %>% pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda)) + labs(y = "", title = "Transformed Gas Production (lambd = 0.11)")

The variation increases with the level of the series, so a box-cox transformation helps to make all the variances similar across the whole series.

Question 3.8

set.seed(123)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()

autoplot(x11_dcmp) + 
  labs(title = "Decomposition of Retail Turnover using X-11")

There are a few outliers that is obvious from looking at the spikes in the irregular plot, the most significant one is the one in 2001. Another observation is that the seasonality decreaases over time.

Question 3.9

  1. There is an overall increasing trend in the number of persons in the civilian labor force in Australia. However, there were decline in the labor force around 1991 and 1992 by the significant decrease in the remainder plot, which main be due to a recession. There is some level of sesonality, although the scale of seasonality is insignificant compared to trend, so the seasonality does not have much of an influence. There is some cycles present, although insignificant.

  2. The recession of 1991/1992 is visible in the estimated components.